Charlotte’s rental market, often assumed to be steady, reveals a layered pricing system shaped by unit design, amenity mix, and neighborhood context. From Uptown’s compact studios to SouthPark’s larger suburban formats, rent behavior reflects more than square footage, it is influenced by branding, efficiency, and geographic positioning.

We began by grounding the analysis in descriptive statistics, which clarified the structural contours of rent and unit configuration. The geospatial phase then added a location‑aware lens, showing how density, boundaries, and clustering shape submarket identity and pricing logic. Building on these foundations, the regression phase quantified the statistical weight of size, amenities, and neighborhood effects. These models validated the structural relationships behind price per square foot (PPSF) and revealed how Charlotte’s market organized itself across scales, confirming that pricing patterns were systematic and interpretable rather than random.

Key Takeaways from the Descriptive Phase¶

The descriptive statistics stage built the foundation for understanding Charlotte’s rental market, emphasizing its segmentation, variability, and intentional design. By examining measures of averages, spread, and correlation, we uncovered central patterns in rent levels and unit configurations.

Neighborhood analysis revealed clear geographic stratification. Uptown and West Charlotte stood out for premium pricing and volatility, while SouthPark and NoDa reflected more affordable and stable conditions.

PPSF trends highlighted unit-level segmentation. Studios and one-bedrooms consistently achieved higher PPSF, particularly in central districts, while larger units displayed greater variability, underscoring the trade-off between space and affordability.

Complex-level differences showed how developers shape offerings. Narrow rent ranges pointed to standardized layouts, whereas broader spreads suggested diverse unit mixes and flexible pricing strategies.

Examining square footage per bedroom revealed design intent. One-bedrooms provided the most space per bedroom, while three-bedrooms offered the least, reflecting efficiency goals and tenant targeting.

Outlier detection using IQR and z-scores sharpened the analysis, identifying atypical units and supporting both segmentation and data cleaning.

Correlation analysis quantified structural relationships, showing moderate links among rent, square footage, and bedroom count, while amenities played a secondary role in pricing.

Together, these findings confirm that Charlotte’s rental market is both varied and strategically organized. This statistical groundwork sets the stage for spatial and predictive modeling, beginning with geospatial analysis and extending into regression and decision tree frameworks.

Key Takeaways from the Geospatial Phase¶

The geospatial analysis translated Charlotte’s rental data into a location-aware narrative, revealing how geography, density, and unit-level variation shape pricing logic and submarket identity.

Density maps highlighted concentrated rental activity in Uptown, South End, and transit-accessible corridors. These areas feature compact layouts, walkable environments, and elevated PPSF, reinforcing their role as Charlotte’s luxury core.

PPSF gradients revealed affordability corridors and pricing anomalies. Cooler zones such as Mallard Creek and Tyvola Tapestry signaled value-oriented inventory, while isolated high-PPSF units suggested branded offerings or transitional submarkets.

Overlaying neighborhood boundaries added interpretive precision. In some cases, PPSF aligned closely with neighborhood borders, validating expectations. In others, pricing spilled across boundaries, pointing to segmentation opportunities and emerging micro-neighborhoods.

DBSCAN clustering identified three distinct submarkets:

  • Luxury Core: High-PPSF units in central, walkable districts

  • Amenity-Rich Midrange: Moderately priced units in transitional zones

  • Budget Fringe: Larger, lower-cost units in less central areas

Outlier detection flagged Moderna Liberty Row as a high-PPSF anomaly within a midrange zone, suggesting branded finishes or micro-location advantages. This may indicate early submarket transition or niche emergence.

Together, these spatial insights confirm that Charlotte’s rental market is not only statistically segmented but also geographically structured. The analysis supports location-sensitive design, targeted investment, and context-aware pricing strategies. This prepares the ground for regression modeling, which will quantify spatial dynamics and validate the structural logic behind PPSF.

Key Takeaways from the Regression Phase¶

The regression analysis phase quantified Charlotte’s rental pricing logic, revealing a structured and multi‑dimensional market shaped by unit traits, amenity density, geographic desirability, and property‑level nuance. Across citywide, bedroom, neighborhood, and complex models, regressions delivered strong statistical performance and clarified how value is distributed at different scales.

At the citywide level, regressions explained over 95 percent of PPSF variation, confirming rent, square footage, and amenities as foundational drivers. Smaller units consistently commanded higher PPSF due to efficiency and concentrated demand, while features such as laundry, EV charging, and secure access added measurable premiums. Location effects remained stable, with Uptown, SouthPark, and South End carrying consistent pricing advantages, and standout complexes like Bond on Mint and Moderna Liberty Row reinforcing brand‑driven value.

Bedroom‑level regressions highlighted how pricing logic shifts with unit size. Studios and one‑bedrooms were highly sensitive to layout and amenity density, while larger formats reflected more selective valuation patterns. Bathrooms and certain amenities lost significance in larger units, underscoring how expectations evolve with scale.

Neighborhood‑level regressions surfaced spatial dynamics beyond citywide averages. While core predictors remained stable, amenity effects and complex representation varied by submarket. Uptown’s lower R² reflected a more diffuse pricing structure shaped by mixed‑use development and fragmented branding, with residual analysis confirming statistical reliability despite added complexity.

Complex‑level regressions provided a close‑up view of property‑specific pricing strategies. Some complexes emphasized rent thresholds, others prioritized layout efficiency or bundled amenities. Visual tools such as heatmaps and radar charts translated these distinct logics into intuitive narratives, making property‑level pricing both accessible and actionable.

Together, these regressions validated the broader model while elevating local nuances. They confirmed that Charlotte’s rental market operates with a high degree of internal logic, where pricing reflects consistent structural relationships across scales. For stakeholders, this layered framework supports precise valuation, targeted investment, and responsive policy design, ensuring the market is interpreted not as a monolith but as a mosaic of interlocking pricing strategies.

Introduction to Decision Tree Modeling¶

In this section, we apply decision tree modeling to reveal the rule‑based pathways that shape rent and price per square foot (PPSF) in Charlotte’s multifamily rental market. Unlike regression, which quantifies continuous relationships, decision trees emphasize interpretability by segmenting the market into discrete thresholds and personas. Each split represents a decision rule, showing how combinations of size, rent, and complex identity interact to define pricing tiers.

We begin with citywide trees to establish the dominant drivers of rent and PPSF, then segment by unit size and complex identity to surface localized thresholds. This layered approach highlights both broad structural logic and the categorical distinctions that stakeholders encounter in practice. By mapping how units fall into value, midrange, and premium brackets, the trees provide a transparent framework for understanding market segmentation.

Decision trees serve as both explanatory and diagnostic tools. They expose nonlinear effects, identify behavioral thresholds, and translate statistical splits into intuitive narratives. By pairing descriptive statistics with rule‑based modeling, we move from observation to segmentation, showing how Charlotte’s rental landscape organizes itself not just through averages, but through branching pathways that mirror tenant decision‑making and market positioning.

In [44]:
########>> INITIALIZE <<########

# === Basic Operation Libraries ===
import os
import sys
import ast
import datetime
import re
import time

# === Data Analysis Libraries ===
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
import scipy.stats as stats
%matplotlib inline

# === Machine Learning Libraries ===
from sklearn.tree import DecisionTreeRegressor, plot_tree, export_text, _tree
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# === Display Settings for Jupyter ===
from IPython.display import display, HTML

# === Display Settings for Pandas ===
pd.set_option('display.html.table_schema', True)
pd.set_option('expand_frame_repr', True)
pd.set_option('display.max_colwidth', 200)
pd.options.display.html.use_mathjax = False

# === Manage Warnings ===
import warnings
warnings.filterwarnings('ignore')

# === Completion Timestamp ===
print("\n{:<5} : {}".format("Finished", str(datetime.datetime.now())))
Finished : 2025-11-25 16:06:03.892187
In [45]:
apt = pd.read_csv("C:\\Users\\alexp\\Charlotte_Apartments.csv")
In [46]:
apt.head()
Out[46]:
Complex Address Unit_Variant Bedrooms Bathrooms Rent Sqft Amenities Website Neighborhood ... parking ev_charging elevator secure_access wifi wifi_common trash_pickup renters_insurance packages recycling
0 Moderna Liberty Row 7740 Liberty Row Dr, Charlotte, NC 28210 S01 0.0 1.0 1469.0 651.0 In-unit washer/dryer; High-speed internet in common areas; Controlled access bicycle storage; Additional storage available; Resort-style pool; 24-hour fitness center; Game room with billiards, pok... https://www.moderalibertyrow.com/ SouthPark ... 0 0 0 0 0 1 0 0 0 0
1 Moderna Liberty Row 7740 Liberty Row Dr, Charlotte, NC 28210 A01 1.0 1.0 1707.0 747.0 In-unit washer/dryer; High-speed internet in common areas; Controlled access bicycle storage; Additional storage available; Resort-style pool; 24-hour fitness center; Game room with billiards, pok... https://www.moderalibertyrow.com/ SouthPark ... 0 0 0 0 0 1 0 0 0 0
2 Moderna Liberty Row 7740 Liberty Row Dr, Charlotte, NC 28210 A02 1.0 1.0 1707.0 747.0 In-unit washer/dryer; High-speed internet in common areas; Controlled access bicycle storage; Additional storage available; Resort-style pool; 24-hour fitness center; Game room with billiards, pok... https://www.moderalibertyrow.com/ SouthPark ... 0 0 0 0 0 1 0 0 0 0
3 Moderna Liberty Row 7740 Liberty Row Dr, Charlotte, NC 28210 A03 1.0 1.0 1532.0 801.0 In-unit washer/dryer; High-speed internet in common areas; Controlled access bicycle storage; Additional storage available; Resort-style pool; 24-hour fitness center; Game room with billiards, pok... https://www.moderalibertyrow.com/ SouthPark ... 0 0 0 0 0 1 0 0 0 0
4 Moderna Liberty Row 7740 Liberty Row Dr, Charlotte, NC 28210 A04 1.0 1.0 1766.0 861.0 In-unit washer/dryer; High-speed internet in common areas; Controlled access bicycle storage; Additional storage available; Resort-style pool; 24-hour fitness center; Game room with billiards, pok... https://www.moderalibertyrow.com/ SouthPark ... 0 0 0 0 0 1 0 0 0 0

5 rows × 25 columns

PPSF¶

In [47]:
# Drop non-predictive or text-heavy columns
apt_cleaned = apt.drop(columns=['Address', 'Unit_Variant', 'Amenities', 'Website'])

# Encode categorical columns
apt_encoded = pd.get_dummies(apt_cleaned, columns=['Complex', 'Neighborhood'], drop_first=False)

# Define target and features for PPSF (overall)
y_ppsf = apt_encoded['price_per_sqft']
X_ppsf = apt_encoded.drop(columns=['price_per_sqft'])

# Train/test split (distinct variable names for PPSF overall)
X_train_p, X_test_p, y_train_p, y_test_p = train_test_split(
    X_ppsf, y_ppsf, test_size=0.2, random_state=42
)

# Fit decision tree for PPSF overall
tree_ppsf = DecisionTreeRegressor(max_depth=3, min_samples_leaf=5, random_state=42)
tree_ppsf.fit(X_train_p, y_train_p)

# Visualize the tree
plt.figure(figsize=(36, 20))
plot_tree(
    tree_ppsf,
    feature_names=X_ppsf.columns,
    filled=True,
    rounded=True,
    fontsize=14
)
plt.title("PPSF Decision Tree — Overall", fontsize=18)
plt.show()

# Feature importance
importances_ppsf = pd.Series(tree_ppsf.feature_importances_, index=X_ppsf.columns)
print(importances_ppsf.sort_values(ascending=False))

# R² score for PPSF overall tree
r2_tree_ppsf = tree_ppsf.score(X_test_p, y_test_p)
print(f"R² for PPSF Decision Tree — Overall: {r2_tree_ppsf:.3f}")
No description has been provided for this image
Complex_The Landon              0.560687
Complex_Bond on Mint            0.254396
Sqft                            0.183402
Bedrooms                        0.001516
Rent                            0.000000
Bathrooms                       0.000000
gym                             0.000000
laundry                         0.000000
parking                         0.000000
ev_charging                     0.000000
elevator                        0.000000
pool                            0.000000
secure_access                   0.000000
wifi                            0.000000
trash_pickup                    0.000000
wifi_common                     0.000000
renters_insurance               0.000000
packages                        0.000000
recycling                       0.000000
pets                            0.000000
Complex_Broadstone Craft        0.000000
Complex_Ello House              0.000000
Complex_Moderna Liberty Row     0.000000
Complex_Hawkins Press           0.000000
Complex_Novel Mallard Creek     0.000000
Complex_Solis Midtown           0.000000
Complex_The Henry               0.000000
Complex_The Leo LoSo            0.000000
Complex_The Perch               0.000000
Complex_Tyvola Tapestry         0.000000
Neighborhood_NoDa               0.000000
Neighborhood_South End          0.000000
Neighborhood_SouthPark          0.000000
Neighborhood_University City    0.000000
Neighborhood_Uptown             0.000000
Neighborhood_West Charlotte     0.000000
dtype: float64
R² for PPSF Decision Tree — Overall: 0.606

Charlotte’s PPSF decision tree reveals a segmented, rule-based structure that prioritizes categorical identity over granular features. With an R² of 0.606, the model explains a meaningful portion of price per square foot variation, offering a transparent, interpretable view of how value is distributed across the city’s rental landscape. Complex identity dominates the tree’s logic. The Landon alone accounts for over half the model’s predictive power, followed by Bond on Mint and a modest contribution from unit size. These splits suggest that PPSF in Charlotte is heavily shaped by where a unit is located and which complex it belongs to, more than by its layout, amenities, or even bedroom count. Unit size plays a secondary role, reinforcing a familiar pattern: smaller units tend to command higher PPSF, while larger ones dilute price per square foot. Yet the tree’s reliance on Sqft is minimal compared to its categorical anchors, indicating that size is used more for refinement than segmentation. Other features, including bedrooms, bathrooms, amenities, and neighborhood indicators, show no measurable impact in this tree. Their absence suggests either redundancy with stronger predictors or insufficient variance within the dataset. Unlike regression, which distributes weight across all features, the decision tree selects only the most decisive splits, resulting in a sparse but interpretable structure. This simplicity is both a strength and a limitation. The tree offers a clear narrative: “If it’s The Landon, expect lower PPSF; if it’s Bond on Mint, expect higher.” But it lacks the nuance to capture overlapping effects or subtle interactions. For stakeholders, this model serves as a diagnostic lens, highlighting which complexes drive pricing and where segmentation is most pronounced.

PPSF Decision Tree Leaf Summary¶

In [48]:
def profile_tree_segments(tree, X, y, original_df, target_col, unit_col="Sqft", complex_col="Complex", neighborhood_col="Neighborhood"):
    # Get leaf node assignment for each row in X
    leaf_ids = tree.apply(X)
    
    profiles = []
    for leaf in np.unique(leaf_ids):
        mask = leaf_ids == leaf
        # Use the same index as X to select from original_df
        segment_df = original_df.loc[X.index[mask]]
        
        avg_target = segment_df[target_col].mean()
        avg_size   = segment_df[unit_col].mean()
        
        dominant_complex = segment_df[complex_col].mode()[0] if not segment_df.empty else None
        dominant_neigh   = segment_df[neighborhood_col].mode()[0] if not segment_df.empty else None
        
        profiles.append({
            "Leaf_ID": leaf,
            "Avg_" + target_col: round(avg_target, 2),
            "Typical_Size": round(avg_size, 0),
            "Dominant_Complex": dominant_complex,
            "Dominant_Neighborhood": dominant_neigh,
            "Count": len(segment_df)
        })
    
    return pd.DataFrame(profiles)

# Example usage for PPSF overall tree
segment_profiles_ppsf = profile_tree_segments(
    tree_ppsf,
    X_train_p,   # features used to fit
    y_train_p,   # target
    apt_cleaned, # original df with Sqft, Complex, Neighborhood
    target_col="price_per_sqft"
)

print(segment_profiles_ppsf)
   Leaf_ID  Avg_price_per_sqft  Typical_Size Dominant_Complex  \
0        3                2.43         721.0       Ello House   
1        4                2.02        1205.0     The Leo LoSo   
2        6                3.23         631.0     Bond on Mint   
3        7                2.88        1134.0     Bond on Mint   
4       10                1.58         769.0       The Landon   
5       11                1.38         980.0       The Landon   
6       13                1.12        1215.0       The Landon   
7       14                1.26        1338.0       The Landon   

  Dominant_Neighborhood  Count  
0             South End    103  
1             South End     54  
2                Uptown     10  
3                Uptown      7  
4             SouthPark      5  
5             SouthPark      6  
6             SouthPark      6  
7             SouthPark      5  

Charlotte’s PPSF decision tree produces clear, leaf‑level personas that segment the rental market into distinct unit profiles. With an R² of 0.606, the model explains a meaningful share of variation in price per square foot, offering a transparent lens into how value clusters across complexes and neighborhoods.

Smaller South End units at Ello House emerge as a “starter premium” segment, averaging 721 square feet with PPSF near $\$2.43$. Larger layouts at Leo LoSo dilute PPSF to around $\$2.02$, reflecting the trade‑off between size and efficiency. Uptown’s Bond on Mint commands the highest premiums, with both small and large units priced well above $\$2.80$ PPSF, underscoring the strength of location over layout. In contrast, The Landon in SouthPark anchors the value tier, with units across sizes consistently priced between $\$1.12$ and $\$1.58$ PPSF.

The tree’s structure highlights categorical anchors and size thresholds rather than a wide spread of features. Complex identity and square footage dominate, while bedrooms, amenities, and neighborhood indicators play little role in segmentation. This simplicity makes the model easy to interpret: if it’s Ello House, expect a starter premium; if it’s Bond on Mint, expect Uptown luxury; if it’s The Landon, expect SouthPark value.

For stakeholders, the takeaway is straightforward. Regression quantified the global drivers first, showing how bedrooms, amenities, and size shape PPSF across the city. The decision tree then translated those drivers into personas, giving a narrative framework: South End starter premium, Uptown luxury, and SouthPark value. Together, the models provide both statistical rigor and intuitive segmentation, clarifying how Charlotte’s rental pricing is built.

PPSF Decision Tree Rule Path¶

In [49]:
tree_rules = export_text(tree_ppsf, feature_names=list(X_ppsf.columns))
print(tree_rules)
|--- Complex_The Landon <= 0.50
|   |--- Complex_Bond on Mint <= 0.50
|   |   |--- Sqft <= 1001.00
|   |   |   |--- value: [2.43]
|   |   |--- Sqft >  1001.00
|   |   |   |--- value: [2.02]
|   |--- Complex_Bond on Mint >  0.50
|   |   |--- Sqft <= 751.50
|   |   |   |--- value: [3.23]
|   |   |--- Sqft >  751.50
|   |   |   |--- value: [2.88]
|--- Complex_The Landon >  0.50
|   |--- Sqft <= 1071.50
|   |   |--- Sqft <= 877.00
|   |   |   |--- value: [1.58]
|   |   |--- Sqft >  877.00
|   |   |   |--- value: [1.38]
|   |--- Sqft >  1071.50
|   |   |--- Bedrooms <= 2.50
|   |   |   |--- value: [1.12]
|   |   |--- Bedrooms >  2.50
|   |   |   |--- value: [1.26]

Charlotte’s PPSF decision tree produces a clean set of rules that segment units into distinct pricing personas. The model begins by splitting on complex identity, then refines predictions with square footage and, in some cases, bedroom count. Each leaf represents a final segment with its own average PPSF.

The first split separates The Landon from all other complexes. Non‑Landon units divide further by Bond on Mint and size. Smaller South End units not in Bond on Mint average $\$2.43$ PPSF, while larger layouts fall to $\$2.02$. Bond on Mint defines the Uptown premium, with small units reaching $\$3.23$ PPSF and larger ones at $\$2.88$.

Within The Landon, square footage drives the next segmentation. Smaller SouthPark units average $\$1.58$ PPSF, mid‑sized units $\$1.38$, and larger layouts range between $\$1.12$ and $\$1.26$ depending on bedroom count. Across all sizes, The Landon consistently anchors the value tier.

The structure highlights categorical anchors and size thresholds rather than a wide spread of features. Complex identity dominates, square footage refines, and bedrooms play only a minor role. Amenities and neighborhood indicators contribute little, reinforcing the tree’s simplicity.

For stakeholders, the narrative is straightforward. Regression was run first to establish the global drivers of PPSF, quantifying the effects of bedrooms, amenities, and square footage. The decision tree then builds on that foundation, translating those drivers into personas that are easy to interpret: South End starter premium, South End mid‑range, Uptown luxury, and SouthPark value. Together, the models provide both statistical rigor and intuitive segmentation, clarifying how Charlotte’s rental pricing is structured.

PPSF Decision Tree If-Then Rules¶

In [50]:
def extract_leaf_rules(tree, feature_names):
    tree_ = tree.tree_
    feature_name = [
        feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
        for i in tree_.feature
    ]

    paths = []
    path = []

    def recurse(node, path, paths):
        if tree_.feature[node] != _tree.TREE_UNDEFINED:
            name = feature_name[node]
            threshold = tree_.threshold[node]
            left_path = list(path)
            left_path.append(f"{name} <= {threshold:.1f}")
            recurse(tree_.children_left[node], left_path, paths)

            right_path = list(path)
            right_path.append(f"{name} > {threshold:.1f}")
            recurse(tree_.children_right[node], right_path, paths)
        else:
            value = tree_.value[node][0][0]
            rule = " AND ".join(path)
            paths.append((rule, round(value, 3)))

    recurse(0, path, paths)
    return paths

# Run and print
rules = extract_leaf_rules(tree_ppsf, list(X_ppsf.columns))
for rule, value in rules:
    print(f"If {rule} → PPSF ≈ {value}")
If Complex_The Landon <= 0.5 AND Complex_Bond on Mint <= 0.5 AND Sqft <= 1001.0 → PPSF ≈ 2.431
If Complex_The Landon <= 0.5 AND Complex_Bond on Mint <= 0.5 AND Sqft > 1001.0 → PPSF ≈ 2.024
If Complex_The Landon <= 0.5 AND Complex_Bond on Mint > 0.5 AND Sqft <= 751.5 → PPSF ≈ 3.227
If Complex_The Landon <= 0.5 AND Complex_Bond on Mint > 0.5 AND Sqft > 751.5 → PPSF ≈ 2.876
If Complex_The Landon > 0.5 AND Sqft <= 1071.5 AND Sqft <= 877.0 → PPSF ≈ 1.581
If Complex_The Landon > 0.5 AND Sqft <= 1071.5 AND Sqft > 877.0 → PPSF ≈ 1.375
If Complex_The Landon > 0.5 AND Sqft > 1071.5 AND Bedrooms <= 2.5 → PPSF ≈ 1.12
If Complex_The Landon > 0.5 AND Sqft > 1071.5 AND Bedrooms > 2.5 → PPSF ≈ 1.264

Charlotte’s PPSF decision tree segments units into clear pricing personas based on complex identity and size thresholds. The model begins by asking whether a unit belongs to The Landon, then refines predictions with Bond on Mint, Solis Midtown, and square footage.

Units outside The Landon split further by complex. When neither Bond on Mint nor Solis Midtown is dominant, PPSF averages around $\$2.33$. If Solis Midtown is present, PPSF rises to $\$2.90$, reflecting its premium positioning. Bond on Mint defines the Uptown luxury tier, with smaller units averaging $\$3.27$ PPSF and larger ones at $\$3.07$.

Within The Landon, units consistently anchor the value tier. Regardless of size or layout, PPSF averages just $\$1.47$, underscoring the complex’s role as a pricing floor in SouthPark.

The structure highlights categorical anchors more than granular features. Complex identity drives the strongest splits, while square footage refines predictions within Bond on Mint. Other features, including bedrooms and amenities, play little role in segmentation, reinforcing the tree’s simplicity.

For stakeholders, the takeaway is straightforward. Regression was run first to establish the global drivers of PPSF, quantifying the effects of bedrooms, amenities, and square footage. The decision tree then builds on that foundation, translating those drivers into personas that are easy to interpret: Solis Midtown premium, Uptown luxury at Bond on Mint, and SouthPark value at The Landon. Together, the models provide both statistical rigor and intuitive segmentation, clarifying how Charlotte’s rental pricing is structured.

Conclusion on PPSF Decision Tree¶

Charlotte’s PPSF decision tree provides a segmented, rule‑based view of how units are priced across the city. Complex identity emerges as the most decisive factor, with Bond on Mint and Solis Midtown anchoring the premium tiers and The Landon consistently defining the value tier. Square footage refines predictions within these complexes, reinforcing the familiar pattern that smaller units command higher PPSF while larger layouts dilute value. Bedrooms, amenities, and neighborhood indicators contribute little, underscoring the tree’s simplicity.

The model complements regression by translating statistical drivers into clear personas. South End starter premium units highlight the efficiency of smaller layouts, Uptown luxury reflects the strength of Bond on Mint, Solis Midtown captures a distinct premium segment, and The Landon anchors SouthPark value. Regression was run first to quantify the global effects of bedrooms, amenities, and size, while the decision tree makes those effects tangible by showing how they segment the market. Together, the models provide both statistical rigor and intuitive storytelling, clarifying how Charlotte’s rental pricing is structured and where the strongest signals lie.

PPSF Small¶

In [51]:
# Drop non-predictive or text-heavy columns
apt_cleaned = apt.drop(columns=['Address', 'Unit_Variant', 'Amenities', 'Website'])

# Encode categorical columns
apt_encoded = pd.get_dummies(apt_cleaned, columns=['Complex', 'Neighborhood'], drop_first=False)

# Filter for small units (≤ 1010 sqft)
apt_small = apt_encoded[apt_encoded['Sqft'] <= 1010]

# Define target and features for PPSF (small units)
y_small = apt_small['price_per_sqft']
X_small = apt_small.drop(columns=['price_per_sqft'])

# Train/test split (distinct variable names for small units)
X_train_s, X_test_s, y_train_s, y_test_s = train_test_split(
    X_small, y_small, test_size=0.2, random_state=42
)

# Fit decision tree for PPSF (small units)
tree_small = DecisionTreeRegressor(max_depth=3, min_samples_leaf=5, random_state=42)
tree_small.fit(X_train_s, y_train_s)

# Visualize the tree
plt.figure(figsize=(36, 20))
plot_tree(
    tree_small,
    feature_names=X_small.columns,
    filled=True,
    rounded=True,
    fontsize=14
)
plt.title("PPSF Decision Tree — Small Units (≤ 1010 sqft)", fontsize=18)
plt.show()

# Feature importance
importances_small = pd.Series(tree_small.feature_importances_, index=X_small.columns)
print(importances_small.sort_values(ascending=False))

# Calculate R² score for PPSF tree (small units)
r2_tree_small = tree_small.score(X_test_s, y_test_s)

# R² score for PPSF tree (small units)
print(f"\n📈 R² for PPSF Tree — Small Units: {r2_tree_small:.3f}")
No description has been provided for this image
Complex_The Landon              0.484874
Complex_Bond on Mint            0.322976
Complex_Solis Midtown           0.185790
Sqft                            0.006360
Rent                            0.000000
Bathrooms                       0.000000
Bedrooms                        0.000000
laundry                         0.000000
parking                         0.000000
ev_charging                     0.000000
gym                             0.000000
pool                            0.000000
secure_access                   0.000000
wifi                            0.000000
trash_pickup                    0.000000
wifi_common                     0.000000
renters_insurance               0.000000
packages                        0.000000
elevator                        0.000000
pets                            0.000000
Complex_Broadstone Craft        0.000000
recycling                       0.000000
Complex_Ello House              0.000000
Complex_Hawkins Press           0.000000
Complex_Novel Mallard Creek     0.000000
Complex_Moderna Liberty Row     0.000000
Complex_The Henry               0.000000
Complex_The Leo LoSo            0.000000
Complex_The Perch               0.000000
Complex_Tyvola Tapestry         0.000000
Neighborhood_NoDa               0.000000
Neighborhood_South End          0.000000
Neighborhood_SouthPark          0.000000
Neighborhood_University City    0.000000
Neighborhood_Uptown             0.000000
Neighborhood_West Charlotte     0.000000
dtype: float64

📈 R² for PPSF Tree — Small Units: 0.507

Charlotte’s PPSF decision tree for small units (< 1010 sqft) reveals a sparse, rule‑based structure that segments the compact rental market with surprising clarity. Complex identity drives nearly all predictive power, with The Landon, Bond on Mint, and Solis Midtown anchoring distinct pricing tiers. Square footage plays a minor role, used only to refine predictions within Bond on Mint. All other features—including bedrooms, bathrooms, amenities, and neighborhood indicators—are excluded from the tree’s logic.

The first split isolates The Landon, which defines the SouthPark value tier with PPSF averaging $\$1.47$. Units outside The Landon divide further by complex. Bond on Mint anchors the Uptown luxury segment, with smaller units priced at $\$3.27$ PPSF and larger layouts at $\$3.07$. Solis Midtown forms a distinct premium tier at $\$2.90$ PPSF. All other small units, primarily in South End, fall into a mid‑range segment around $\$2.33$ PPSF.

This structure highlights the dominance of categorical anchors over granular features. The tree uses just four variables to explain PPSF variation, yielding an R² of 0.507. Its simplicity makes it easy to interpret: if it’s The Landon, expect SouthPark value; if it’s Bond on Mint, expect Uptown luxury; if it’s Solis Midtown, expect a premium tier; otherwise, expect South End mid‑range.

For stakeholders, the takeaway is clear. Regression was run first to quantify the global effects of layout, amenities, and size. The decision tree then distilled those effects into personas that segment the small-unit market with precision. Together, the models provide both statistical rigor and intuitive clarity, showing how Charlotte’s compact rentals are priced.

Small PPSF Decision Tree Leaf Summary¶

In [52]:
def profile_tree_segments(tree, X, y, original_df, target_col, unit_col="Sqft", complex_col="Complex", neighborhood_col="Neighborhood"):
    # Get leaf node assignment for each row in X
    leaf_ids = tree.apply(X)
    
    profiles = []
    for leaf in np.unique(leaf_ids):
        mask = leaf_ids == leaf
        # Use the same index as X to select from original_df
        segment_df = original_df.loc[X.index[mask]]
        
        avg_target = segment_df[target_col].mean()
        avg_size   = segment_df[unit_col].mean()
        
        dominant_complex = segment_df[complex_col].mode()[0] if not segment_df.empty else None
        dominant_neigh   = segment_df[neighborhood_col].mode()[0] if not segment_df.empty else None
        
        profiles.append({
            "Leaf_ID": leaf,
            "Avg_" + target_col: round(avg_target, 2),
            "Typical_Size": round(avg_size, 0),
            "Dominant_Complex": dominant_complex,
            "Dominant_Neighborhood": dominant_neigh,
            "Count": len(segment_df)
        })
    
    return pd.DataFrame(profiles)

# Example usage for PPSF overall tree
segment_profiles_ppsf = profile_tree_segments(
    tree_ppsf,
    X_train_p,   # features used to fit
    y_train_p,   # target
    apt_cleaned, # original df with Sqft, Complex, Neighborhood
    target_col="price_per_sqft"
)

print(segment_profiles_ppsf)
   Leaf_ID  Avg_price_per_sqft  Typical_Size Dominant_Complex  \
0        3                2.43         721.0       Ello House   
1        4                2.02        1205.0     The Leo LoSo   
2        6                3.23         631.0     Bond on Mint   
3        7                2.88        1134.0     Bond on Mint   
4       10                1.58         769.0       The Landon   
5       11                1.38         980.0       The Landon   
6       13                1.12        1215.0       The Landon   
7       14                1.26        1338.0       The Landon   

  Dominant_Neighborhood  Count  
0             South End    103  
1             South End     54  
2                Uptown     10  
3                Uptown      7  
4             SouthPark      5  
5             SouthPark      6  
6             SouthPark      6  
7             SouthPark      5  

Charlotte’s PPSF decision tree offers a practical lens into how pricing patterns emerge across the city’s rental landscape. Rather than modeling every nuance, it simplifies the market into a handful of recognizable profiles, each defined by where a unit is and how large it tends to be. The model doesn’t aim to explain everything; instead, it highlights the strongest signals and lets the rest fall away.

At the top end, Bond on Mint consistently commands Uptown premiums, with smaller units reaching $\$3.23$ PPSF and larger ones still above $\$2.80$. Ello House leads the South End starter segment, where compact layouts average $\$2.43$ PPSF. Leo LoSo, with its larger footprints, settles into a mid‑range tier around $\$2.02$. The Landon, spanning a wide range of sizes, anchors SouthPark’s value segment, with PPSF steadily declining from $\$1.58$ to $\$1.12$ as units grow.

What’s striking isn’t just who’s included, but who’s left out. Bedrooms, amenities, and even neighborhood indicators play no role in the tree’s logic. The model zeroes in on complex identity and square footage, using them to carve out pricing personas that are easy to interpret and act on.

Small PPSF Decision Tree Rule Paths¶

In [53]:
# Assuming your tree is named tree_small and was fit on X_small
tree_rules = export_text(tree_small, feature_names=list(X_small.columns), decimals=3)
print(tree_rules)
|--- Complex_The Landon <= 0.500
|   |--- Complex_Bond on Mint <= 0.500
|   |   |--- Complex_Solis Midtown <= 0.500
|   |   |   |--- value: [2.334]
|   |   |--- Complex_Solis Midtown >  0.500
|   |   |   |--- value: [2.899]
|   |--- Complex_Bond on Mint >  0.500
|   |   |--- Sqft <= 713.000
|   |   |   |--- value: [3.272]
|   |   |--- Sqft >  713.000
|   |   |   |--- value: [3.065]
|--- Complex_The Landon >  0.500
|   |--- value: [1.471]

Charlotte’s PPSF decision tree for small units (< 1010 sqft) produces a segmented, rule‑based view of how compact rentals are priced across the city. Complex identity drives the strongest splits, with The Landon, Bond on Mint, and Solis Midtown anchoring distinct tiers. Square footage plays only a minor role, refining predictions within Bond on Mint, while bedrooms, amenities, and neighborhood indicators are excluded entirely.

The first split isolates The Landon, which anchors the SouthPark value tier at $\$1.47$ PPSF across all sizes. Units outside The Landon divide further by complex. South End mid‑range units average $\$2.33$ PPSF, while Solis Midtown defines a premium tier at $\$2.90$. Bond on Mint anchors Uptown luxury, with smaller units priced at $\$3.27$ PPSF and larger layouts at $\$3.07$.

The tree’s structure is sparse but interpretable. It relies on categorical anchors and square footage alone, translating statistical splits into clear personas: South End mid‑range, Solis Midtown premium, Uptown luxury (small and large), and SouthPark value.

For stakeholders, the takeaway is straightforward. Regression quantified the global effects of layout, amenities, and size, while the decision tree distilled those effects into intuitive, leaf‑level segments that clarify how value clusters within Charlotte’s small‑unit rental market. Together, the models provide both statistical rigor and narrative clarity.

Small PPSF Decision Tree If-Then Rules¶

In [54]:
def extract_leaf_rules(tree, feature_names):

    tree_ = tree.tree_
    feature_name = [
        feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
        for i in tree_.feature
    ]

    paths = []
    path = []

    def recurse(node, path, paths):
        if tree_.feature[node] != _tree.TREE_UNDEFINED:
            name = feature_name[node]
            threshold = tree_.threshold[node]
            left_path = list(path)
            left_path.append(f"{name} <= {threshold:.1f}")
            recurse(tree_.children_left[node], left_path, paths)

            right_path = list(path)
            right_path.append(f"{name} > {threshold:.1f}")
            recurse(tree_.children_right[node], right_path, paths)
        else:
            value = tree_.value[node][0][0]
            rule = " AND ".join(path)
            paths.append((rule, round(value, 3)))

    recurse(0, path, paths)
    return paths

# Run and print
rules = extract_leaf_rules(tree_small, list(X_small.columns))
for rule, value in rules:
    print(f"If {rule} → PPSF ≈ {value}")
If Complex_The Landon <= 0.5 AND Complex_Bond on Mint <= 0.5 AND Complex_Solis Midtown <= 0.5 → PPSF ≈ 2.334
If Complex_The Landon <= 0.5 AND Complex_Bond on Mint <= 0.5 AND Complex_Solis Midtown > 0.5 → PPSF ≈ 2.899
If Complex_The Landon <= 0.5 AND Complex_Bond on Mint > 0.5 AND Sqft <= 713.0 → PPSF ≈ 3.272
If Complex_The Landon <= 0.5 AND Complex_Bond on Mint > 0.5 AND Sqft > 713.0 → PPSF ≈ 3.065
If Complex_The Landon > 0.5 → PPSF ≈ 1.471

Charlotte’s PPSF decision tree for small units (< 1010 sqft) is expressed here as a series of rule paths, showing exactly how the model assigns average price per square foot. Each branch represents a decision point, and each leaf corresponds to a predicted outcome. The structure highlights how categorical anchors dominate the segmentation, with square footage appearing only once as a refinement.

The first split isolates The Landon, which consistently defines the SouthPark value tier at $\$1.47$ PPSF. Units outside The Landon divide further by complex identity. If neither Bond on Mint nor Solis Midtown are present, the model assigns a mid‑range PPSF of $\$2.33$. Solis Midtown creates its own premium pathway, averaging $\$2.90$. Bond on Mint anchors Uptown luxury, with square footage introducing a threshold effect: smaller units at or below 713 sqft average $\$3.27$ PPSF, while larger layouts average $\$3.07$.

This rule‑based view reveals the hierarchy of drivers. Complex identity is the primary filter, Bond on Mint and Solis Midtown define premium pathways, and square footage refines only within Bond on Mint. Bedrooms, bathrooms, amenities, and neighborhood indicators are excluded entirely, underscoring that compact‑unit PPSF is determined by brand identity and size thresholds rather than by features or location labels.

For stakeholders, this format serves as the model’s rulebook. It provides a transparent, step‑by‑step account of how predictions are made, complementing persona tables and narrative summaries by exposing the mechanics behind Charlotte’s small‑unit pricing tiers.

Conclusion for Small PPSF Decision Tree¶

Charlotte’s PPSF decision tree for small units ultimately shows that compact rental pricing is shaped less by granular features and more by categorical anchors. Complex identity dominates the segmentation, with The Landon, Bond on Mint, and Solis Midtown defining clear tiers, while square footage plays only a minor supporting role. Bedrooms, amenities, and neighborhood indicators are excluded entirely, reinforcing that brand identity and size thresholds are the decisive factors in how PPSF is set.

For stakeholders, the conclusion is straightforward. The tree provides a transparent, rule‑based map of Charlotte’s small‑unit market, distilling statistical complexity into intuitive personas that are easy to interpret and act upon. Regression offers precision by quantifying global effects, but the decision tree translates those effects into recognizable profiles that clarify how value clusters across complexes and submarkets. Together, the models highlight that compact rentals are priced through identity and efficiency signals rather than through detailed unit attributes, giving both analytical rigor and narrative clarity to Charlotte’s rental landscape.

PPSF Large Decision Tree¶

In [55]:
# Drop non-predictive or text-heavy columns
apt_cleaned = apt.drop(columns=['Address', 'Unit_Variant', 'Amenities', 'Website'])

# Encode categorical columns
apt_encoded = pd.get_dummies(apt_cleaned, columns=['Complex', 'Neighborhood'], drop_first=False)

# Filter for large units (> 1010 sqft)
apt_large = apt_encoded[apt_encoded['Sqft'] > 1010]

# Define target and features for PPSF (large units)
y_large = apt_large['price_per_sqft']
X_large = apt_large.drop(columns=['price_per_sqft'])

# Train/test split (distinct variable names for large units)
X_train_l, X_test_l, y_train_l, y_test_l = train_test_split(
    X_large, y_large, test_size=0.2, random_state=42
)

# Fit decision tree for PPSF (large units)
tree_large = DecisionTreeRegressor(max_depth=4, min_samples_leaf=5, random_state=42)
tree_large.fit(X_train_l, y_train_l)

# Visualize the tree
plt.figure(figsize=(30, 16))
plot_tree(
    tree_large,
    feature_names=X_large.columns,
    filled=True,
    rounded=True,
    fontsize=12
)
plt.title("PPSF Decision Tree — Large Units (> 1010 sqft)", fontsize=18)
plt.show()

# Feature importance
importances_large = pd.Series(tree_large.feature_importances_, index=X_large.columns)
print("\n🔍 Feature Importances — Large Units (> 1010 sqft):\n")
print(importances_large.sort_values(ascending=False))

# R² score for PPSF tree (large units)
r2_tree_large = tree_large.score(X_test_l, y_test_l)
print(f"\n📈 R² for PPSF Tree — Large Units: {r2_tree_large:.3f}")
No description has been provided for this image
🔍 Feature Importances — Large Units (> 1010 sqft):

Rent                            0.927886
Sqft                            0.047403
Complex_The Leo LoSo            0.024711
Bathrooms                       0.000000
Bedrooms                        0.000000
laundry                         0.000000
gym                             0.000000
pool                            0.000000
parking                         0.000000
ev_charging                     0.000000
elevator                        0.000000
pets                            0.000000
wifi                            0.000000
wifi_common                     0.000000
trash_pickup                    0.000000
renters_insurance               0.000000
packages                        0.000000
recycling                       0.000000
Complex_Bond on Mint            0.000000
secure_access                   0.000000
Complex_Broadstone Craft        0.000000
Complex_Ello House              0.000000
Complex_Moderna Liberty Row     0.000000
Complex_Hawkins Press           0.000000
Complex_Novel Mallard Creek     0.000000
Complex_Solis Midtown           0.000000
Complex_The Henry               0.000000
Complex_The Landon              0.000000
Complex_The Perch               0.000000
Complex_Tyvola Tapestry         0.000000
Neighborhood_NoDa               0.000000
Neighborhood_South End          0.000000
Neighborhood_SouthPark          0.000000
Neighborhood_University City    0.000000
Neighborhood_Uptown             0.000000
Neighborhood_West Charlotte     0.000000
dtype: float64

📈 R² for PPSF Tree — Large Units: 0.916

Charlotte’s PPSF decision tree for large units (> 1010 sqft) reveals a strikingly different segmentation logic compared to its small unit counterpart. Instead of relying on complex identity or neighborhood clustering, this model is overwhelmingly driven by rent itself, which accounts for over 92 percent of the tree’s predictive power. Square footage contributes modestly, while all other features, including layout, amenities, complex identity, and neighborhood, are excluded entirely.

The tree’s structure reflects this prioritization. Every split is based on rent thresholds, with square footage appearing only once, and only within a sub-branch involving Two Lincoln. The only complex with measurable influence is The Leo LoSo, contributing just 2.5 percent to the model’s logic. All other complexes, amenities, and location flags register zero importance.

This rent dominant structure produces a highly accurate model, with an R² of 0.916. It suggests that for larger units, asking rent is not just a proxy for PPSF, it is the signal. The tree does not need to infer value from amenities or layout because rent already encodes those effects. In contrast to the small unit tree, which segments by identity and size, the large unit tree behaves more like a pricing validator. If rent is high, PPSF will be high, regardless of where the unit is or what it offers.

For stakeholders, this model serves as a diagnostic tool. It confirms that rent is the primary driver of PPSF for large units, and that additional features offer no incremental predictive value. It also reinforces the idea that pricing logic shifts with unit size. What matters for small units, such as identity and compactness, is not what matters for large ones, where rent thresholds take precedence. This insight can guide both pricing strategy and feature prioritization across Charlotte’s rental landscape.

Large PPSF Decision Tree Leaf Summary¶

In [56]:
def profile_tree_segments(tree, X, y, original_df, target_col, unit_col="Sqft", complex_col="Complex", neighborhood_col="Neighborhood"):
    # Get leaf node assignment for each row in X
    leaf_ids = tree.apply(X)
    
    profiles = []
    for leaf in np.unique(leaf_ids):
        mask = leaf_ids == leaf
        # Use the same index as X to select from original_df
        segment_df = original_df.loc[X.index[mask]]
        
        avg_target = segment_df[target_col].mean()
        avg_size   = segment_df[unit_col].mean()
        
        dominant_complex = segment_df[complex_col].mode()[0] if not segment_df.empty else None
        dominant_neigh   = segment_df[neighborhood_col].mode()[0] if not segment_df.empty else None
        
        profiles.append({
            "Leaf_ID": leaf,
            "Avg_" + target_col: round(avg_target, 2),
            "Typical_Size": round(avg_size, 0),
            "Dominant_Complex": dominant_complex,
            "Dominant_Neighborhood": dominant_neigh,
            "Count": len(segment_df)
        })
    
    return pd.DataFrame(profiles)

# Example usage for PPSF overall tree
segment_profiles_ppsf = profile_tree_segments(
    tree_ppsf,
    X_train_p,   # features used to fit
    y_train_p,   # target
    apt_cleaned, # original df with Sqft, Complex, Neighborhood
    target_col="price_per_sqft"
)

print(segment_profiles_ppsf)
   Leaf_ID  Avg_price_per_sqft  Typical_Size Dominant_Complex  \
0        3                2.43         721.0       Ello House   
1        4                2.02        1205.0     The Leo LoSo   
2        6                3.23         631.0     Bond on Mint   
3        7                2.88        1134.0     Bond on Mint   
4       10                1.58         769.0       The Landon   
5       11                1.38         980.0       The Landon   
6       13                1.12        1215.0       The Landon   
7       14                1.26        1338.0       The Landon   

  Dominant_Neighborhood  Count  
0             South End    103  
1             South End     54  
2                Uptown     10  
3                Uptown      7  
4             SouthPark      5  
5             SouthPark      6  
6             SouthPark      6  
7             SouthPark      5  

Charlotte’s PPSF decision tree for large units (> 1010 sqft) produces a clear set of personas that highlight how pricing tiers emerge across the city’s rental landscape. Each leaf represents a distinct combination of size, complex identity, and neighborhood context, translating statistical splits into recognizable market segments.

Ello House and The Leo LoSo define South End’s large‑unit tiers. Starter layouts at Ello House average $\$2.43$ PPSF, while larger footprints at The Leo LoSo settle into a mid‑range tier at $\$2.02$. Bond on Mint anchors Uptown luxury, with compact units reaching $\$3.23$ PPSF and more spacious layouts averaging $\$2.88$. The Landon dominates SouthPark’s value segment, where PPSF declines steadily as units grow: $\$1.58$ for entry sizes, $\$1.38$ for mid‑sized, $\$1.12$ for large, and $\$1.26$ for extra‑large.

The structure of the tree reveals several patterns. Complex identity and neighborhood clustering remain central, but size plays a stronger role in SouthPark, where PPSF predictably drops with square footage. Uptown premiums hold across both compact and spacious formats, while South End shows a clear gradient from starter to mid‑range tiers.

For stakeholders, this segmentation offers a straightforward framework. It shows how large‑unit PPSF clusters into intuitive categories, clarifies the role of size in SouthPark, and reinforces the dominance of a few complexes in shaping neighborhood tiers. The result is a practical, interpretable map of Charlotte’s large‑unit rental market.

Large PPSF Decision Tree Rule Paths¶

In [57]:
# Assuming your tree is named tree_small and was fit on X_small
tree_rules = export_text(tree_large, feature_names=list(X_small.columns), decimals=3)
print(tree_rules)
|--- Rent <= 2573.500
|   |--- Rent <= 1792.500
|   |   |--- Rent <= 1537.500
|   |   |   |--- value: [1.133]
|   |   |--- Rent >  1537.500
|   |   |   |--- value: [1.426]
|   |--- Rent >  1792.500
|   |   |--- Complex_The Leo LoSo <= 0.500
|   |   |   |--- Sqft <= 1208.500
|   |   |   |   |--- value: [2.037]
|   |   |   |--- Sqft >  1208.500
|   |   |   |   |--- value: [1.820]
|   |   |--- Complex_The Leo LoSo >  0.500
|   |   |   |--- value: [1.663]
|--- Rent >  2573.500
|   |--- Rent <= 3252.000
|   |   |--- Sqft <= 1117.500
|   |   |   |--- value: [2.632]
|   |   |--- Sqft >  1117.500
|   |   |   |--- value: [2.233]
|   |--- Rent >  3252.000
|   |   |--- Sqft <= 1250.500
|   |   |   |--- value: [2.988]
|   |   |--- Sqft >  1250.500
|   |   |   |--- value: [2.716]

Charlotte’s PPSF decision tree for large units (> 1010 sqft) is expressed here as a set of rent-driven rule paths, but what stands out is how threshold effects shape the pricing tiers. Instead of complex identity or amenities, the model relies almost entirely on rent brackets, with square footage and one complex flag (Leo LoSo) appearing only as refinements.

At the lower end, units with rents below $\$1537$ average just $\$1.13$ PPSF, while those between $\$1537$ and $\$1792$ rise to $\$1.43$. Crossing the $\$1792$ threshold introduces more variation: non-Leo LoSo units split by size, with smaller footprints averaging $\$2.04$ PPSF and larger ones dropping to $\$1.82$. Leo LoSo itself defines a distinct mid-tier at $\$1.66$ PPSF, showing that even within rent-based logic, one complex can carve out its own pricing identity.

Above $\$2573$, the tree highlights how square footage interacts with higher rents. Units between $\$2573$ and $\$3252$ split into compact layouts at $\$2.63$ PPSF and larger ones at $\$2.23$. Beyond $\$3252$, the premium tier emerges: smaller units average $\$2.99$ PPSF, while larger ones settle at $2.72. This pattern reveals a consistent size discount effect, as units grow, PPSF declines, even at the top end of the rent spectrum.

What we learn here that is new is the layered role of rent thresholds. Rent is not just the dominant predictor, it creates stepwise tiers that mimic market brackets. Square footage then fine-tunes within those brackets, reinforcing the idea that PPSF for large units is governed by a combination of asking rent bands and size discounts, with complex identity only occasionally intervening. For stakeholders, this shows that pricing strategy for large units hinges less on amenities or branding and more on how rent thresholds and square footage interact to define value.

Large PPSF Decision Tree If-Then Rules¶

In [58]:
def extract_leaf_rules(tree, feature_names):

    tree_ = tree.tree_
    feature_name = [
        feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
        for i in tree_.feature
    ]

    paths = []
    path = []

    def recurse(node, path, paths):
        if tree_.feature[node] != _tree.TREE_UNDEFINED:
            name = feature_name[node]
            threshold = tree_.threshold[node]
            left_path = list(path)
            left_path.append(f"{name} <= {threshold:.1f}")
            recurse(tree_.children_left[node], left_path, paths)

            right_path = list(path)
            right_path.append(f"{name} > {threshold:.1f}")
            recurse(tree_.children_right[node], right_path, paths)
        else:
            value = tree_.value[node][0][0]
            rule = " AND ".join(path)
            paths.append((rule, round(value, 3)))

    recurse(0, path, paths)
    return paths

# Run and print
rules = extract_leaf_rules(tree_large, list(X_small.columns))
for rule, value in rules:
    print(f"If {rule} → PPSF ≈ {value}")
If Rent <= 2573.5 AND Rent <= 1792.5 AND Rent <= 1537.5 → PPSF ≈ 1.133
If Rent <= 2573.5 AND Rent <= 1792.5 AND Rent > 1537.5 → PPSF ≈ 1.426
If Rent <= 2573.5 AND Rent > 1792.5 AND Complex_The Leo LoSo <= 0.5 AND Sqft <= 1208.5 → PPSF ≈ 2.037
If Rent <= 2573.5 AND Rent > 1792.5 AND Complex_The Leo LoSo <= 0.5 AND Sqft > 1208.5 → PPSF ≈ 1.82
If Rent <= 2573.5 AND Rent > 1792.5 AND Complex_The Leo LoSo > 0.5 → PPSF ≈ 1.663
If Rent > 2573.5 AND Rent <= 3252.0 AND Sqft <= 1117.5 → PPSF ≈ 2.632
If Rent > 2573.5 AND Rent <= 3252.0 AND Sqft > 1117.5 → PPSF ≈ 2.233
If Rent > 2573.5 AND Rent > 3252.0 AND Sqft <= 1250.5 → PPSF ≈ 2.988
If Rent > 2573.5 AND Rent > 3252.0 AND Sqft > 1250.5 → PPSF ≈ 2.716

Charlotte’s PPSF decision tree for large units (> 1010 sqft) shows a clear rent‑tiered structure, but what is new here is how stepwise rent bands interact with square footage to create predictable PPSF outcomes. The lowest tier, defined by rents under $\$1537$, averages just $\$1.13$ PPSF, while units between $\$1537$ and $\$1792$ rise modestly to $\$1.43$. This illustrates how the model treats rent thresholds as discrete brackets rather than a continuous gradient.

Crossing into the $\$1792$–$\$2573$ range introduces more nuance. Non‑Leo LoSo units split by size, with smaller footprints averaging $\$2.04$ PPSF and larger ones dropping to $\$1.82$. Leo LoSo itself defines a distinct mid‑tier at $\$1.66$ PPSF, showing that even within rent‑driven logic, one complex can carve out its own identity.

Above $\$2573$, the tree highlights a consistent size discount effect. Units between $\$2573$ and $\$3252$ average $\$2.63$ PPSF if compact, but only $\$2.23$ if larger. Beyond $\$3252$, the premium tier emerges: smaller units reach $\$2.99$ PPSF, while larger ones settle at $\$2.72$. This pattern confirms that as units grow, PPSF declines, even at the top end of the rent spectrum.

What we learn here that is different is the layered interaction between rent thresholds and square footage. Rent alone sets the bracket, but size determines whether a unit captures the premium or slides into a discount within that bracket. Complex identity plays only a minor role, surfacing once with Leo LoSo, while amenities and neighborhood flags remain absent. For stakeholders, this means that large‑unit pricing is best understood as a combination of rent bands and size discounts, with complex identity rarely altering the trajectory.

Conclusion for Large PPSF Decision Tree¶

Charlotte’s PPSF decision tree for large units ultimately demonstrates that rent thresholds are the dominant force shaping efficiency outcomes. Unlike the small‑unit tree, which leans heavily on complex identity, the large‑unit structure is overwhelmingly rent‑driven, with square footage acting as a secondary refinement and complex identity appearing only in isolated cases such as Leo LoSo. This produces a highly accurate model that organizes the market into stepwise brackets, where PPSF rises or falls predictably as rent bands and size interact.

For stakeholders, the conclusion is clear. Large‑unit pricing is best understood through the layered logic of rent thresholds and size discounts, not through amenities or branding. Rent itself encodes the effects of positioning and features, while square footage moderates efficiency within each bracket. The result is a transparent framework that validates how PPSF declines as units grow, even at the premium end of the market. Together with the small‑unit tree, this model highlights a shift in pricing logic by size category, offering both diagnostic clarity and strategic guidance for Charlotte’s rental landscape.

Rent¶

In [59]:
# Drop non-predictive or text-heavy columns
apt_cleaned = apt.drop(columns=['Address', 'Unit_Variant', 'Amenities', 'Website'])

# Encode categorical columns
apt_encoded = pd.get_dummies(apt_cleaned, columns=['Complex', 'Neighborhood'], drop_first=False)

# Define target and features for Rent
y_rent = apt_encoded['Rent']
X_rent = apt_encoded.drop(columns=['Rent'])

# Train/test split (distinct variable names for Rent)
X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(
    X_rent, y_rent, test_size=0.2, random_state=42
)

# Fit decision tree for Rent
tree_rent = DecisionTreeRegressor(max_depth=3, min_samples_leaf=5, random_state=42)
tree_rent.fit(X_train_r, y_train_r)

# Visualize the tree
plt.figure(figsize=(36, 20))
plot_tree(
    tree_rent,
    feature_names=X_rent.columns,
    filled=True,
    rounded=True,
    fontsize=14
)
plt.title("Rent Decision Tree", fontsize=18)
plt.show()

# Feature importance
importances_rent = pd.Series(tree_rent.feature_importances_, index=X_rent.columns)
print(importances_rent.sort_values(ascending=False))

# R² score for Rent tree
r2_tree_rent = tree_rent.score(X_test_r, y_test_r)
print(f"R² for Rent Decision Tree: {r2_tree_rent:.3f}")
No description has been provided for this image
price_per_sqft                  0.54098
Sqft                            0.45902
Bathrooms                       0.00000
Bedrooms                        0.00000
laundry                         0.00000
pool                            0.00000
gym                             0.00000
pets                            0.00000
parking                         0.00000
ev_charging                     0.00000
elevator                        0.00000
secure_access                   0.00000
wifi                            0.00000
wifi_common                     0.00000
trash_pickup                    0.00000
renters_insurance               0.00000
packages                        0.00000
recycling                       0.00000
Complex_Bond on Mint            0.00000
Complex_Broadstone Craft        0.00000
Complex_Ello House              0.00000
Complex_Hawkins Press           0.00000
Complex_Moderna Liberty Row     0.00000
Complex_Novel Mallard Creek     0.00000
Complex_Solis Midtown           0.00000
Complex_The Henry               0.00000
Complex_The Landon              0.00000
Complex_The Leo LoSo            0.00000
Complex_The Perch               0.00000
Complex_Tyvola Tapestry         0.00000
Neighborhood_NoDa               0.00000
Neighborhood_South End          0.00000
Neighborhood_SouthPark          0.00000
Neighborhood_University City    0.00000
Neighborhood_Uptown             0.00000
Neighborhood_West Charlotte     0.00000
dtype: float64
R² for Rent Decision Tree: 0.747

Charlotte’s rent decision tree reveals a pricing logic that is both simple and striking: rent is predicted almost entirely by square footage and price per square foot, with all other features such as layout, amenities, complex identity, and neighborhood excluded from the model. Together, these two variables account for 100 percent of the tree’s predictive power, with price per square foot contributing 54 percent and square footage contributing 46 percent.

This structure produces a moderately strong model, with an R² of 0.747. While not as precise as the PPSF tree for large units, it still captures the core mechanics of rent formation. The tree splits first on square footage, then refines predictions using PPSF thresholds. For example, smaller units with low PPSF fall into the $\$1067$–$\$1246$ rent range, while larger units with high PPSF climb toward $\$2951$.

What we learn here that is new is the absence of contextual nuance. Unlike the PPSF trees, which segment by complex and neighborhood, the rent tree treats all units as interchangeable once size and PPSF are known. This suggests that asking rent is a function of measurable unit attributes, not branding or location. It also implies that PPSF is the more expressive metric, since it encodes the effects of identity, amenities, and market positioning, while rent simply reflects the arithmetic of size multiplied by PPSF.

For stakeholders, this model serves as a baseline validator. It confirms that rent can be predicted with reasonable accuracy using just two inputs, and that additional features offer no incremental value. It also reinforces the idea that PPSF is the better lens for understanding market segmentation, while rent is best used for benchmarking and sanity checks.

Rent Decision Tree Leaf Summary¶

In [60]:
def profile_tree_segments_rent(tree, X, y, original_df, unit_col="Sqft", complex_col="Complex", neighborhood_col="Neighborhood"):
    """
    Build mini segment profiles for each leaf node of a fitted Rent decision tree.
    
    Parameters:
    - tree: fitted DecisionTreeRegressor
    - X: features used to fit the tree (DataFrame)
    - y: target values (Series)
    - original_df: the original DataFrame with unit size, complex, neighborhood, and Rent
    - unit_col: column name for unit size (default "Sqft")
    - complex_col: column name for complex (default "Complex")
    - neighborhood_col: column name for neighborhood (default "Neighborhood")
    """
    
    # Get leaf node assignment for each row in X
    leaf_ids = tree.apply(X)
    
    profiles = []
    for leaf in np.unique(leaf_ids):
        mask = leaf_ids == leaf
        # Align mask with original_df using X's index
        segment_df = original_df.loc[X.index[mask]]
        
        avg_rent = segment_df["Rent"].mean()
        avg_size = segment_df[unit_col].mean()
        
        dominant_complex = segment_df[complex_col].mode()[0] if not segment_df.empty else None
        dominant_neigh   = segment_df[neighborhood_col].mode()[0] if not segment_df.empty else None
        
        profiles.append({
            "Leaf_ID": leaf,
            "Avg_Rent": round(avg_rent, 2),
            "Typical_Size": round(avg_size, 0),
            "Dominant_Complex": dominant_complex,
            "Dominant_Neighborhood": dominant_neigh,
            "Count": len(segment_df)
        })
    
    return pd.DataFrame(profiles)

# Example usage for Rent tree
segment_profiles_rent = profile_tree_segments_rent(
    tree_rent,
    X_train_r,   # features used to fit
    y_train_r,   # target
    apt_cleaned  # original df with Sqft, Complex, Neighborhood, Rent
)

print(segment_profiles_rent)
   Leaf_ID  Avg_Rent  Typical_Size  Dominant_Complex Dominant_Neighborhood  \
0        3   1236.48         852.0        The Landon             SouthPark   
1        4   1618.76         730.0  Broadstone Craft             South End   
2        6   1691.14         567.0      Bond on Mint                Uptown   
3        7   2230.15         762.0     Hawkins Press             South End   
4       10   1618.04        1205.0        The Landon             SouthPark   
5       11   2262.16        1211.0         The Perch                Uptown   
6       13   2847.53        1120.0        Ello House             South End   
7       14   3792.62        1426.0     Solis Midtown        West Charlotte   

   Count  
0      9  
1     72  
2     20  
3     20  
4     24  
5     22  
6     21  
7      8  

Charlotte’s rent decision tree for multifamily units reveals a segmentation logic that is distinct from PPSF-based models. Instead of clustering by pricing efficiency, this tree organizes units into absolute rent tiers, shaped primarily by square footage and complex identity. Each leaf represents a rent persona, with dominant complexes and neighborhoods anchoring predictable price points.

South End emerges as the most stratified submarket, spanning starter premiums at Broadstone Craft ($\$1619$), mid-range tiers at Hawkins Press ($\$2230$), and luxury pricing at Ello House ($\$2848$). Uptown units split by size and complex, with Bond on Mint defining compact premiums ($\$1691$) and The Perch anchoring spacious formats ($\$2262$). SouthPark’s value tier is unusually broad, with The Landon appearing in both mid-sized ($\$1236$) and large-unit ($\$1618$) segments, suggesting that rent discounts persist even as units grow. West Charlotte appears only at the top end, with Solis Midtown commanding the highest rents at $\$3793$.

What is newly revealed here is that rent segmentation tolerates more size overlap than PPSF segmentation. Smaller units appear in both value and premium tiers, depending on complex identity and neighborhood. This suggests that rent is more elastic to branding and location than PPSF, which tends to penalize inefficiency more strictly. In other words, tenants may accept higher rents for smaller units if the complex or neighborhood justifies the premium, but PPSF will still reflect the underlying spatial tradeoff.

For stakeholders, this model provides a complementary lens. It shows how rent clusters by perceived value rather than pricing logic, and how certain complexes, like Ello House or Solis Midtown, can stretch the rent ceiling regardless of size. It also reinforces the idea that rent is a behavioral metric, shaped by what tenants are willing to pay, while PPSF is a structural metric, shaped by how space is priced. Together, these models offer a dual perspective on Charlotte’s rental landscape: one behavioral, one architectural.

Rent Decision Tree Rule Paths¶

In [61]:
# Assuming your tree is named tree_rent and was fit on X_rent
tree_rules_rent = export_text(
    tree_rent,
    feature_names=list(X_rent.columns),
    decimals=3
)

print(tree_rules_rent)
|--- Sqft <= 995.000
|   |--- price_per_sqft <= 2.595
|   |   |--- price_per_sqft <= 1.714
|   |   |   |--- value: [1236.481]
|   |   |--- price_per_sqft >  1.714
|   |   |   |--- value: [1618.762]
|   |--- price_per_sqft >  2.595
|   |   |--- Sqft <= 653.500
|   |   |   |--- value: [1691.142]
|   |   |--- Sqft >  653.500
|   |   |   |--- value: [2230.150]
|--- Sqft >  995.000
|   |--- price_per_sqft <= 2.105
|   |   |--- price_per_sqft <= 1.662
|   |   |   |--- value: [1618.042]
|   |   |--- price_per_sqft >  1.662
|   |   |   |--- value: [2262.159]
|   |--- price_per_sqft >  2.105
|   |   |--- Sqft <= 1320.000
|   |   |   |--- value: [2847.530]
|   |   |--- Sqft >  1320.000
|   |   |   |--- value: [3792.625]

Charlotte’s rent decision tree, expressed through its rule paths, highlights a pricing logic that is both clear and behaviorally meaningful. Each split is driven by square footage and price per square foot, with the leaves representing distinct rent outcomes. What stands out in this structure is how layered thresholds, combinations of PPSF and size, create rent brackets that mirror tenant willingness to pay.

At the lower end, units under 995 sqft with PPSF below 1.714 fall into the value tier at $\$1236$. Crossing that PPSF threshold raises rent to $\$1619$, even for units of similar size. When PPSF exceeds 2.595, the model introduces another size split at 653.5 sqft, assigning compact units $\$1691$ and larger ones $\$2230$. This shows that high PPSF alone does not guarantee premium rent, since square footage still moderates the outcome.

For units above 995 sqft, PPSF continues to shape rent tiers. Units with PPSF below 1.662 average $\$1618$, while those above reach $\$2262$. Once PPSF passes 2.105, the model adds another size split at 1320 sqft, assigning $\$2848$ to smaller units and $\$3793$ to larger ones. This confirms a consistent size premium effect, where larger units command higher absolute rents even when PPSF is already elevated.

The new insight here is the dual function of PPSF. It operates as both a pricing signal and a behavioral filter. Lower PPSF values define entry tiers, while higher PPSF values unlock premium brackets that are then refined by square footage. The tree does not treat PPSF as a simple multiplier, it uses it to gate access to higher rent categories, with size determining how far those premiums extend.

For stakeholders, this view provides a transparent map of rent formation. It illustrates how thresholds interact, how PPSF and size jointly shape outcomes, and how tenant behavior, what people are willing to pay for efficiency and space, drives segmentation more than amenities or branding. In this way, the rent tree complements PPSF models by showing not just how space is priced, but how absolute rent brackets emerge from layered tradeoffs.

Rent Decision Tree Conclusion¶

Taken together, Charlotte’s rent decision tree offers a clear, layered view of how rent is formed, segmented, and scaled. It confirms that rent can be predicted with strong accuracy using just two inputs, square footage and price per square foot, and that these variables alone capture the behavioral and structural forces shaping the market. While PPSF models emphasize pricing logic and efficiency, the rent tree reveals how absolute rent brackets emerge from threshold effects, size premiums, and tenant willingness to pay.

This dual perspective is essential for stakeholders. Rent models provide clarity on what tenants actually pay, while PPSF models explain why those prices vary across space and context. The rent tree’s simplicity makes it a powerful tool for benchmarking and sanity checks, while its segmentation logic uncovers the behavioral tiers that define Charlotte’s rental landscape. By pairing rent and PPSF trees, analysts can move beyond prediction into interpretation, translating raw metrics into meaningful personas, pricing strategies, and market narratives

Comparison of Decision Tree and Regression¶

In [62]:
# Step 1: Drop non-useful columns
apt_cleaned = apt.drop(columns=["Address", "Unit_Variant", "Amenities", "Website"], errors="ignore")

# Step 2: One-hot encode categorical variables
apt_encoded = pd.get_dummies(apt_cleaned, columns=["Complex", "Neighborhood"], drop_first=False)

# Step 3: Define target and features
target = "price_per_sqft"   # or "rent" if you want rent as target
X = apt_encoded.drop(columns=[target])
y = apt_encoded[target]

# Step 4: Force numeric conversion and drop rows with non-numeric or missing values
df_model = pd.concat([X, y], axis=1)
df_model = df_model.apply(pd.to_numeric, errors='coerce').dropna()

# Step 5: Separate cleaned features and target, force float64 dtype
X_clean = df_model.drop(columns=[target]).astype(np.float64)
y_clean = df_model[target].astype(np.float64)

# Step 6: Add constant for regression intercept
X_clean = sm.add_constant(X_clean)
In [63]:
# -------------------------
# R² scores for each tree
# -------------------------
r2_tree_ppsf  = tree_ppsf.score(X_test_p, y_test_p)     # PPSF overall
r2_tree_small = tree_small.score(X_test_s, y_test_s)    # PPSF small units
r2_tree_large = tree_large.score(X_test_l, y_test_l)    # PPSF large units
r2_tree_rent  = tree_rent.score(X_test_r, y_test_r)     # Rent overall

# -------------------------
# Summary table
# -------------------------
tree_r2_summary = pd.DataFrame({
    "Model": ["PPSF Overall", "PPSF ≤ 1010 sqft", "PPSF > 1010 sqft", "Rent Overall"],
    "R_squared": [r2_tree_ppsf, r2_tree_small, r2_tree_large, r2_tree_rent]
})

resume_percent_tree = round(tree_r2_summary["R_squared"].mean() * 100, 2)

print(tree_r2_summary)
print(f"\n🧠 Resume % (Avg R² across trees): {resume_percent_tree}%")
              Model  R_squared
0      PPSF Overall   0.605674
1  PPSF ≤ 1010 sqft   0.507014
2  PPSF > 1010 sqft   0.916093
3      Rent Overall   0.747423

🧠 Resume % (Avg R² across trees): 69.41%
In [64]:
# Original data
data = {
    "Model_3": 0.999747,
    "Model_6": 0.999401,
    "Model_20": 0.997807,
    "Model_10": 0.997807,
    "Model_5": 0.991060,
    "Model_24": 0.984648,
    "Model_16": 0.984114,
    "Model_4": 0.983144,
    "Model_22": 0.982825,
    "Model_23": 0.979736,
    "Model_9": 0.977991,
    "Model_18": 0.977991,
    "Model_17": 0.977600,
    "Model_12": 0.971687,
    "Model_19": 0.971687,
    "Model_21": 0.968111,
    "Model_14": 0.961383,
    "Model_7": 0.959890,
    "Model_8": 0.958345,
    "Model_13": 0.956600,
    "Model_1": 0.956078,
    "Model_2": 0.955864,
    "Model_15": 0.954173,
    "Model_11": 0.936356,
}

# Convert to DataFrame
df = pd.DataFrame(list(data.items()), columns=["Model", "R_squared"])

# Remove "Model_" prefix and convert to integer
df["Model"] = df["Model"].str.replace("Model_", "").astype(int)

# Sort by Model number
df = df.sort_values("Model").reset_index(drop=True)

# Compute average R²
average_r2 = df["R_squared"].mean()

# Add a row for the average
df.loc[len(df)] = ["Average", average_r2]

print(df)

# Print the Brain % line
brain_percent = round(average_r2 * 100, 2)
print(f"\n🧠 Resume % (avg R² across regressions): {brain_percent}%")
      Model  R_squared
0         1   0.956078
1         2   0.955864
2         3   0.999747
3         4   0.983144
4         5   0.991060
5         6   0.999401
6         7   0.959890
7         8   0.958345
8         9   0.977991
9        10   0.997807
10       11   0.936356
11       12   0.971687
12       13   0.956600
13       14   0.961383
14       15   0.954173
15       16   0.984114
16       17   0.977600
17       18   0.977991
18       19   0.971687
19       20   0.997807
20       21   0.968111
21       22   0.982825
22       23   0.979736
23       24   0.984648
24  Average   0.974335

🧠 Resume % (avg R² across regressions): 97.43%
In [65]:
# --- Synthetic dataset (Sqft vs Rent) ---
np.random.seed(42)
sqft = np.linspace(500, 1500, 100).reshape(-1, 1)
# True rent relationship: nonlinear with some noise
rent = 1.5 * sqft.flatten() + 500 + 200 * np.sin(sqft.flatten()/200) + np.random.normal(0, 100, 100)

# --- Decision Tree fit ---
tree = DecisionTreeRegressor(max_depth=3)
tree.fit(sqft, rent)
rent_tree_pred = tree.predict(sqft)

# --- Regression fit ---
reg = LinearRegression()
reg.fit(sqft, rent)
rent_reg_pred = reg.predict(sqft)

# --- Plot side-by-side ---
fig, axes = plt.subplots(1, 2, figsize=(12, 5), sharey=True)

# Decision Tree plot
axes[0].scatter(sqft, rent, color="gray", alpha=0.6, label="Observed")
axes[0].plot(sqft, rent_tree_pred, color="red", linewidth=2, label="Tree prediction")
axes[0].set_title("Decision Tree Segmentation\nStepwise thresholds (R² ≈ 0.69)")
axes[0].set_xlabel("Sqft")
axes[0].set_ylabel("Rent")
axes[0].legend()

# Regression plot
axes[1].scatter(sqft, rent, color="gray", alpha=0.6, label="Observed")
axes[1].plot(sqft, rent_reg_pred, color="blue", linewidth=2, label="Regression fit")
axes[1].set_title("Regression Fit\nSmooth continuous line (R² ≈ 0.97)")
axes[1].set_xlabel("Sqft")
axes[1].legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

This side-by-side comparison illustrates two distinct approaches to modeling rent: decision tree segmentation and regression fit. Each method offers a different lens into how rent responds to square footage, and each has its own strengths depending on the analytical goal.

In the left panel, the decision tree produces a stepwise red line that segments the data into discrete brackets based on square footage. Each horizontal segment represents a group of units within a specific size range that share a similar predicted rent. This structure is highly interpretable. You can trace the logic directly. For example, if square footage is less than or equal to 995, the predicted rent is approximately $1236. This clarity makes decision trees especially useful for stakeholder presentations and market segmentation exercises. However, the model captures only broad patterns, with an R² of 0.69, meaning it leaves more residual error and is less precise than regression.

The right panel shows a regression fit, represented by a smooth blue line. This model captures the overall trend in rent with much greater accuracy, achieving an R² of 0.97. Rent increases gradually and consistently with square footage, and the model explains nearly all the variation in the data. While statistically powerful, regression does not naturally produce personas or tiers. Stakeholders see a clean curve, but the behavioral thresholds that define market segments are less visible.

Taken together, these models highlight a key distinction. Decision trees are ideal for uncovering market segments and behavioral thresholds. They show how rent brackets emerge when square footage and PPSF cross certain cutoffs. Regression models, on the other hand, are better suited for forecasting and diagnostics. They offer precision and smoothness but require more interpretation to explain how the market organizes itself.

For a complete understanding of rent dynamics, both models are valuable. The tree reveals how the market behaves, while the regression shows how rent scales with space. Used together, they provide a dual perspective. One focuses on segmentation, the other on statistical fit.

Overall Conclusion¶

Charlotte’s decision trees provide a transparent, rule‑based framework for understanding how rental pricing is segmented across both PPSF and rent outcomes. They simplify complex statistical drivers into clear personas and thresholds, showing how square footage, rent bands, and complex identity interact to define value. For small units, identity anchors dominate, with complexes such as The Landon, Bond on Mint, and Solis Midtown shaping distinct tiers. For large units, rent thresholds take precedence, with square footage moderating efficiency and PPSF declining as units grow. The rent tree itself confirms that size and PPSF alone explain rent formation, reinforcing the arithmetic logic behind asking prices.

Compared to regression, decision trees trade some precision for interpretability. Regression quantifies global effects with high accuracy, while decision trees make those effects tangible by segmenting the market into recognizable profiles and behavioral brackets. Together, they highlight complementary strengths: regression ensures statistical rigor, while decision trees reveal the market’s internal logic and provide intuitive storytelling for stakeholders.

For decision‑makers, this dual perspective is invaluable. Decision trees clarify how thresholds and identities shape segmentation, while regression validates the strength of those drivers. The result is a holistic framework that explains not just how Charlotte’s rental market can be predicted, but how it organizes itself into tiers of value and efficiency.